Achieving ubiquitous high-accuracy localization is crucial for next-generation wireless systems, yet remains challenging in multipath-rich urban environments. By exploiting the fine-grained multipath characteristics embedded in channel state information (CSI), more reliable and precise localization can be achieved. To address this, we present CMANet, a multi-BS cooperative positioning architecture that performs feature-level fusion of raw CSI using the proposed Channel Masked Attention (CMA) mechanism. The CMA encoder injects a physically grounded prior--per-BS channel gain--into the attention weights, thus emphasizing reliable links and suppressing spurious multipath. A lightweight LSTM decoder then treats subcarriers as a sequence to accumulate frequency-domain evidence into a final 3D position estimate. In a typical 5G NR-compliant urban simulation, CMANet achieves less than 0.5m median error and 1.0m 90th-percentile error, outperforming state-of-the-art benchmarks. Ablations verify the necessity of CMA and frequency accumulation. CMANet is edge-deployable and exemplifies an Integrated Sensing and Communication (ISAC)-aligned, cooperative paradigm for multi-BS CSI positioning.
High-precision three-dimensional (3D) positioning in dense urban non-line-of-sight (NLOS) environments benefits significantly from cooperation among multiple distributed base stations (BSs). However, forwarding raw CSI from multiple BSs to a central unit (CU) incurs prohibitive fronthaul overhead, which limits scalable cooperative positioning in practice. This paper proposes a learning-based edge-cloud cooperative positioning framework under limited-capacity fronthaul constraints. In the proposed architecture, a neural network is deployed at each BS to compress the locally estimated CSI into a quantized representation subject to a fixed fronthaul payload. The quantized CSI is transmitted to the CU, which performs cooperative 3D positioning by jointly processing the compressed CSI received from multiple BSs. The proposed framework adopts a two-stage training strategy consisting of self-supervised local training at the BSs and end-to-end joint training for positioning at the CU. Simulation results based on a 3.5~GHz 5G NR compliant urban ray-tracing scenario with six BSs and 20~MHz bandwidth show that the proposed method achieves a mean 3D positioning error of 0.48~m and a 90th-percentile error of 0.83~m, while reducing the fronthaul payload to 6.25% of lossless CSI forwarding. The achieved performance is close to that of cooperative positioning with full CSI exchange.
The rapid advancement of connected and autonomous vehicles has driven a growing demand for precise and reliable positioning systems capable of operating in complex environments. Meeting these demands requires an integrated approach that combines multiple positioning technologies, including wireless-based systems, perception-based technologies, and motion-based sensors. This paper presents a comprehensive survey of wireless-based positioning for vehicular applications, with a focus on satellite-based positioning (such as global navigation satellite systems (GNSS) and low-Earth-orbit (LEO) satellites), cellular-based positioning (5G and beyond), and IEEE-based technologies (including Wi-Fi, ultrawideband (UWB), Bluetooth, and vehicle-to-vehicle (V2V) communications). First, the survey reviews a wide range of vehicular positioning use cases, outlining their specific performance requirements. Next, it explores the historical development, standardization, and evolution of each wireless positioning technology, providing an in-depth categorization of existing positioning solutions and algorithms, and identifying open challenges and contemporary trends. Finally, the paper examines sensor fusion techniques that integrate these wireless systems with onboard perception and motion sensors to enhance positioning accuracy and resilience in real-world conditions. This survey thus offers a holistic perspective on the historical foundations, current advancements, and future directions of wireless-based positioning for vehicular applications, addressing a critical gap in the literature.
In addition to satellite systems, carrier phase positioning (CPP) is gaining attraction also in terrestrial mobile networks, particularly in 5G New Radio evolution toward 6G. One key challenge is to resolve the integer ambiguity problem, as the carrier phase provides only relative position information. This work introduces and studies a multi-band CPP scenario with intra- and inter-band carrier aggregation (CA) opportunities across FR1, mmWave-FR2, and emerging 6G FR3 bands. Specifically, we derive multi-band CPP performance bounds, showcasing the superiority of multi-band CPP for high-precision localization in current and future mobile networks, while noting also practical imperfections such as clock offsets between the user equipment (UE) and the network as well as mutual clock imperfections between the network nodes. A wide collection of numerical results is provided, covering the impacts of the available carrier bandwidth, number of aggregated carriers, transmit power, and the number of network nodes or base stations. The offered results highlight that only two carriers suffice to substantially facilitate resolving the integer ambiguity problem while also largely enhancing the robustness of positioning against imperfections imposed by the network-side clocks and multi-path propagation. In addition, we also propose a two-stage practical estimator that achieves the derived bounds under all realistic bandwidth and transmit power conditions. Furthermore, we show that with an additional search-based refinement step, the proposed estimator becomes particularly suitable for narrowband Internet of Things applications operating efficiently even under narrow carrier bandwidths. Finally, both the derived bounds and the proposed estimators are extended to scenarios where the bands assigned to each base station are nonuniform or fully disjoint, enhancing the practical deployment flexibility.
Multipath-based simultaneous localization and mapping (MP-SLAM) is a promising approach for future 6G networks to jointly estimate the positions of transmitters and receivers together with the propagation environment. In cooperative MP-SLAM, information collected by multiple mobile terminals (MTs) is fused to enhance accuracy and robustness. Existing methods, however, typically assume perfectly synchronized base stations (BSs) and orthogonal transmission sequences, rendering inter-BS interference at the MTs negligible. In this work, we relax these assumptions and address simultaneous source separation, synchronization, and mapping. A relevant example arises in modern 5G systems, where BSs employ muting patterns to mitigate interference, yet localization performance still degrades. We propose a novel BS-dependent data association and synchronization bias model, integrated into a joint Bayesian framework and inferred via the sum-product algorithm on a factor graph. The impact of joint synchronization and source separation is analyzed under various system configurations. Compared with state-of-the-art cooperative MP-SLAM assuming orthogonal and synchronized BSs, our statistical analysis shows no significant performance degradation. The proposed BS-dependent data association model constitutes a principled approach for classifying features by arbitrary properties, such as reflection order or feature type (scatterers versus walls).
Channel-state information (CSI)-based sensing will play a key role in future cellular systems. However, no CSI dataset has been published from a real-world 5G NR system that facilitates the development and validation of suitable sensing algorithms. To close this gap, we publish three real-world wideband multi-antenna multi-open RAN radio unit (O-RU) CSI datasets from the 5G NR uplink channel: an indoor lab/office room dataset, an outdoor campus courtyard dataset, and a device classification dataset with six commercial-off-the-shelf (COTS) user equipments (UEs). These datasets have been recorded using a software-defined 5G NR testbed based on NVIDIA Aerial RAN CoLab Over-the-Air (ARC-OTA) with COTS hardware, which we have deployed at ETH Zurich. We demonstrate the utility of these datasets for three CSI-based sensing tasks: neural UE positioning, channel charting in real-world coordinates, and closed-set device classification. For all these tasks, our results show high accuracy: neural UE positioning achieves 0.6cm (indoor) and 5.7cm (outdoor) mean absolute error, channel charting in real-world coordinates achieves 73cm mean absolute error (outdoor), and device classification achieves 99% (same day) and 95% (next day) accuracy. The CSI datasets, ground-truth UE position labels, CSI features, and simulation code are publicly available at https://caez.ethz.ch
Wireless foundation models (WFMs) have recently demonstrated promising capabilities, jointly performing multiple wireless functions and adapting effectively to new environments. However, while current WFMs process only one modality, depending on the task and operating conditions, the most informative modality changes and no single modality is best for all tasks. WFMs should therefore be designed to accept multiple modalities to enable a broader and more diverse range of tasks and scenarios. In this work, we propose and build the first multimodal wireless foundation model capable of processing both raw IQ streams and image-like wireless modalities (e.g., spectrograms and CSI) and performing multiple tasks across both. We introduce masked wireless modeling for the multimodal setting, a self-supervised objective and pretraining recipe that learns a joint representation from IQ streams and image-like wireless modalities. We evaluate the model on five tasks across both modality families: image-based (human activity sensing, RF signal classification, 5G NR positioning) and IQ-based (RF device fingerprinting, interference detection/classification). The multimodal WFM is competitive with single-modality WFMs, and in several cases surpasses their performance. Our results demonstrates the strong potential of developing multimodal WFMs that support diverse wireless tasks across different modalities. We believe this provides a concrete step toward both AI-native 6G and the vision of joint sensing, communication, and localization.
This study analyzes the performance of positioning techniques based on configuration changes of 5G New Radio signals. In 5G networks, a terminal position is determined from the Time of Arrival of Positioning Reference Signals transmitted by base stations. We propose an algorithm that improves TOA accuracy under low sampling rate constraints and implement 5G PRS for positioning in a software defined modem. We also examine how flexible time frequency resource allocation of PRS affects TOA estimation accuracy and discuss optimal PRS configurations for a given signal environment.
This paper presents an autonomous sensing frame- work for identifying and localizing multiple users in Fifth Generation (5G) networks using an Unmanned Aerial Vehicle (UAV) that is not part of the serving access network. Unlike conventional aerial serving nodes, the proposed UAV operates passively and is dedicated solely to sensing. It captures Uplink (UL) Sounding Reference Signals (SRS), and requires virtually no coordination with the network infrastructure. A complete signal processing chain is proposed and developed, encompassing synchronization, user identification, and localization, all executed onboard UAV during flight. The system autonomously plans and adapts its mission workflow to estimate multiple user positions within a single deployment, integrating flight control with real-time sensing. Extensive simulations and a full-scale low- altitude experimental campaign validate the approach, showing localization errors below 3 m in rural field tests and below 8 m in urban simulation scenarios, while reliably identifying each user. The results confirm the feasibility of infrastructure-independent sensing UAVs as a core element of the emerging Low Altitude Economy (LAE), supporting situational awareness and rapid deployment in emergency or connectivity-limited environments.
Integrated Sensing and Communication (ISAC) has emerged as a promising solution in addressing the challenges of high-mobility scenarios in 5G NR Vehicle-to-Infrastructure (V2I) communications. This paper proposes a novel sensing-assisted handover framework that leverages ISAC capabilities to enable precise beamforming and proactive handover decisions. Two sensing-enabled handover triggering algorithms are developed: a distance-based scheme that utilizes estimated spatial positioning, and a probability-based approach that predicts vehicle maneuvers using interacting multiple model extended Kalman filter (IMM-EKF) tracking. The proposed methods eliminate the need for uplink feedback and beam sweeping, thus significantly reducing signaling overhead and handover interruption time. A sensing-assisted NR frame structure and corresponding protocol design are also introduced to support rapid synchronization and access under vehicular mobility. Extensive link-level simulations using real-world map data demonstrate that the proposed framework reduces the average handover interruption time by over 50%, achieves lower handover rates, and enhances overall communication performance.